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Abstract 

<i>- values are experimental measures of how the kinetics of protein 
folding is changed by single-site mutations, ^-values measure ener- 
getic quantities, but are often interpreted in terms of the structures 
of the transition state ensemble. Here we describe a simple analyti- 
cal model of the folding kinetics in terms of the formation of protein 
substructures. The model shows that $-values have both structural 
and energetic components. In addition, it provides a natural and gen- 
eral interpretation of "nonclassical" values (i.e., less than zero, or 
greater than one). The model reproduces the ^-values for 20 single- 
residue mutations in the a-helix of the protein CI2, including several 
nonclassical $-values, in good agreement with experiments. 



Introduction 

The folding kinetics of small single-domain proteins has been widely studied 
by single-site mutagenesis [1-16]. The central quantity in these studies, the 
<I>-value, is given by [17, 18] 

_ RT ln{k^t/kmnt) 

~ AGn ^ ^ 

where k^t and /cmut are the folding rates of the wildtype and mutant protein, 
and AGn is the change of the protein stability upon mutation. The stability 
Gat of a protein is the free energy difference between the native state and 
the denatured state D. 
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There are several theoretical studies of ^-values and transition states. The 

thermal unfolding kinetics of CI2 has been extensively studied in MD sim- 
ulations [19-24]. Here, the transition state is defined as a "small ensemble 
of structures populated immediately prior to the onset of a large structural 
change" [20] in the unfolding trajectories. Other groups have considered 
statistical mechanical or Go-type models [25-36]. In some of these mod- 
els, transition states are identified as free energy maxima along a folding 
reaction coordinate, or as free energy saddle points if two or more degrees 
of freedom are used for the reaction coordinate. More recent approaches 
define the transition state ensemble from experimental <I>-valucs by using 
these <I>- values as restraints in simulations [37-39]. Each of these definitions 
of transition state, while plausible, is nevertheless based on one or more ad 
hoc premises. 

Using classical transition state theory, the folding rate is proportional to 
expl-Gr/RT] where Gt = Gtransition state - ^denatured state is the free energy 
difference between the transition state ensemble and the denatured state. 
Possible changes in the prefactor of this proportionality relation upon mu- 
tation are usually neglected. Thus, $ = AGt/AGjv. In this way, ^-values 
measure the energetic consequences of mutations on the transition state en- 
semble relative to the native state. 

A central question is whether ^-values also give structural information about 
the transition state ensemble [18,40,41]. In the traditional interpretation, 
$ = 1 is taken to indicate that the mutated residue has native-like structure 
in its transition state ensemble (TSE), while $ = is taken to indicate that 
the mutated residue is not structured in the TSE. Typically, experiments 
give <I>-values that are fractional, with values between and 1, apparently 
indicating partial native-like structural character of the residue in the TSE. 

However, there are three problems with this traditional structural interpre- 
tation. First, ^-values are sometimes "nonclassical" ; they can be less than 
zero or larger than one. In the traditional view, such values are impossible, 
implying a transition state that is more denatured than D or more native 
than N; hence there is some controversy about how such ^-values should 
be interpreted. Second, a given sequence position can have very different 
^-values, depending on which amino acid is substituted there, leading to the 
question of whether such energetic changes always have a simple structural 
interpretation. 

Third, there is a problem of continuity: two residues that are neighbors in the 
chain are sometimes observed to have very different ^-values. A structural 
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interpretation of tliis would be that there can be sharp boundaries between 
native-hke and non-native-hke structure in the TSE, which seems implau- 
sible. For example, the protein CI2 consists of an a-helix packed against 
a four-stranded /3-sheet (see Fig. P). Twenty single residue-mutations have 
been studied in the a-helix of CI2, giving (^-values ranging over the full spec- 
trum from -0.35 to 1.25. Even though helix formation is usually regarded as 
fast and cooperative, these results would seem to imply that this helix does 
not form as a single cooperative unit: parts are folded and parts are not in 
the TSE. It is not clear whether these are problems of experimental errors, 
or problems in the traditional model that is used to interpret $-values. 

Is there a more physical way to interpret the formation of protein substruc- 
tures that comprise the TSE of protein folding? We develop here a model. 
We first consider the simplest subdivision of the protein: into one a-helical 
substructure and one /3-sheet substructure. Because of its simplicity, the 
model can be solved analytically and exactly. We then generalize this model 
to apply to CI2. Despite its simplicity, this model reproduces the experi- 
mental values in CI2 with a correlation coefficient of 0.85, including some 
of the nonclassical ^-values. A key conclusion is that it is not sufficient to 
interpret <I>-values solely in terms of structures. A <I>-value can, however, be 
decomposed into structural and energetic components. 

The Dynamics 

Our approach has two aspects: (1) the model, which expresses the relative 
free energies of the various substructures of the protein as it folds, and (2) 
the dynamics of the model. We first describe our treatment of the dynamics. 
To simplify the notation, we define here the free energy Gn of each partially 
folded state n = 1, 2, 3, . . ., and the dimensionless free energy gj^ = Gn/RT, 
with respect to the fully denatured state in which none of the substructures 
is formed. Thus the denatured state is the reference, defined as having zero 
free energy. The transition rate from any state m to state n is given by 



provided the states n and m are connected via a single step in which only 
one substructure folds or unfolds [36]. For other transitions, the transition 
rates are zero. Here, to is a reference time scale. ^ 

^The transition rates obey detailed balance WnmPm = uimnPn where ~ 
exTp[—Gn/{RT)] is the equilibrium weight for the state n. Detailed balance ensures that 




o 



(2) 
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The folding kinetics is described by the master equation 

'-^-^PiO (3) 

The elements of the vector P{t) are the probabilities Pn{t) that the protein 
is in state n at time and the matrix elements of W are given by Wnm = 
—Wnm foin ^ m and Wnn = J2m.ytn '^mn- The general solution of the master 
equation is 

P(i) = ^CAFAexp[-At] (4) 

A 

which is expressed in terms of the eigenvalues A and eigenvectors Y\ of the 
matrix W. The prefactors cx depend on the initial conditions at time t = 0. 

The eigenvalues represent relaxation rates. It can be shown that one eigen- 
value is zero, corresponding to the equilibrium distribution, while all other 
eigenvalues are positive [42]. For t — > oo, the probability vector P{t) tends 
to CqYo where is the eigenvector with eigenvalue 0. 



The Model: Two Substructures 



The dynamics above is applicable to any model of the protein, its substruc- 
tures, and their relative free energies. Here we first apply the dynamics 
to the simplest possible model of the substructures of CI2. There are four 
states in the model: (1) the denatured state D, in which neither the helix 
nor the sheet is formed; (2) a partially folded state a, in which only the 
helix is formed; (3) a partially folded state /3, in which only the /3-sheet is 
formed; and (4) the native state N, in which both the helix and sheet are 
formed and packed against each other. 

In this simple four-state model, the energy landscape is characterized by the 
dimensionless free energy differences ga, gp, and gN oi the states a, (3, and 
N, each taken with respect to the denatured state D, which is defined as 
having zero free energy. 

The folding kinetics of this model can be solved exactly by determining the 
eigenvalues A and eigenvectors 1^^ of the matrix W. Since this model has 
four states, W is a 4 x 4 matrix. In units of 1/to, the eigenvalues are given 
by A = 0, 1 — q, 1 + g, and 2 where 

1 _ pdN-ga-gp 

Q = , (5) 
V(l + e-3")(l + e-g/3)(l + eg^-g«)(l + e^^-^») 

the system ultimately reaches thermal equilibrium. 
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Since we have —l<q<l, the three nonzero eigenvalues are positive and de- 
scribe the relaxation to the equilibrium state of the model (see eq. (HJ). The 
equilibrium state simply is CqY o where Y q is the eigenvector with eigenvalue 
0. 

This model exhibits two-state folding kinetics under two conditions. First, 
the native state must be stable: the free energy of the native state must 
be significantly smaller than the free energies of the other three states. Under 
such folding conditions, the equilibrium native state will be more populated 
than the other three states. Second, the intermediate states a and P must 
have positive free energies, relative to D, so that the system will have a 
kinetic barrier, which is required to achieve single-exponential dynamics. 

Under these two conditions, the three Boltzmann weights e^^~^°'~^>^ , e^'^~^°', 
and e^^"^'' in eq. © are much smaller than 1, and also much smaller than 
e~^" and e~^^. Therefore, these three Boltzmann weights can be neglected. 
We set them to zero. The factor q in eq. © then simplifies to 



V(l + e-^?-)(l + 6-3/3) 

For large barrier energies and gp, we have e~^°' <C 1 and e~^i^ <C 1, 
and therefore (1 + e~3°'){l + e'^i^) ~ (1 + 6"^" + e~^0). If we now use the 
expansion (1 + x)^^^'^ ~ 1 — x/2 with x = e~^" + e~^'^ <C 1, the smallest 
nonzero relaxation rate, or folding rate, k = 1 — q is given by, 

A; ~ i (e-^^" + (7) 

The folding rate k is much smaller than the other two relaxation rates 1 + q 
and 2. In that case, these two fast relaxations constitute an initial 'burst 
phase' and the model otherwise gives two-state single-exponential folding 
behavior with slowest rate k (see eq. (0])). The folding rate k simply is the 
sum of the rates for the two possible folding routes: one in which a forms 
first and the other in which /3 forms first. The factor 1/2 in the equation 
above arises because a molecule, after reaching one of the barrier states a or 
P, either falls back to D or falls forward to N, with almost equal probability. 

Using this model, we now explore the effects of mutations. Consider a 
mutation within the a-helix. The free energy of the helix will change from 
da da + ^9a and the free energy of the native state will change from 
9n 9n+^9n- In contrast, gp is not affected by the mutation. The folding 
rate of the mutant will be /cmut = k{ga + Aga^g/s) with k given by eq. 0. 
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For small perturbations A(7q,, we have Ink^t — In/cmut — —{dlin.k/d ga)^ga- 
For mutations in the a-helix, the <I>-value defined in eq. thus has the 
general form 

<f = X<,|^ (8) 

with 

dink e~^" 

Hence, the value is a product of two terms: a structural factor Xa-, and an 
energetic factor Ag^/ Ag^^f . The term Xa describes the fractional structure 
formation of the a-helix within the TS ensemble. In this example, the TSE 
consists of the two barrier states a and /? on the two parallel folding routes. 
Xa ranges between and 1. We have Xa = 1 ga gp when the state a 
dominates the TSE, and Xa = ^ when f3 dominates the TSE. 

Whereas Xa gives structural information, the second term, Ag^/ Ag^, can 
take on either negative or positive values. This term thus accounts for 
nonclassical values smaller than or larger than 1. In the simplest case, 
we have Ag^ = Ag^ + Ag^p. Here, Ag^p is the free energy change for 
a tertiary contact between the a-helix and the /3-sheet, for example. In 
that case, negative ^-values arise when Ag^^p is larger in magnitude and 
opposite in sign to that of Aga- That is, a negative $-value is predicted 
when a helical mutation also has a counteracting and larger effect on a 
tertiary contact. Correspondingly, ^ > \ occurs when two conditions are 
met: (1) Ag^/s is opposite in sign, but smaller in magnitude than Ag^, and 
(2) Xa is sufficiently large. This explanation of nonclassical ^-values may 
also rationalize why more ^-values are negative than larger than 1 [42]. If 
ga and gap have a similar magnitude, it should be more difficult to satisfy 
the latter two conditions than the former one. 

However, our model is rather general and captures also that nonclassical 
values can arise from shifts in the free energy of the denatured state. For 
example, if a mutation only lowers the free energy of the denatured state, 
we have Ag^ > and Ag^ < 0, which gives a negative <I>-value according 
to eq. (jHl). In contrast, the traditional structural interpretation of ^-values 
fails if mutations shift the free energy of the denatured state [48]. 

In this simple example, a mutation in the a-helix affects only a single struc- 
tural element formed in the TSE: the a-helix itself. In general, mutations 
may affect several microstructures of the TSE. A generalization of eq. 
then is $ = {'^■XiAgi)/Agj\[ with Xi = —idlnk)/{dgi), provided the free 
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energies gi of the microstructures are additive. 
Mutations in the a-helix of CI2 

To model the folding kinetics of CI2, we must consider at least four sub- 
structural units: the a-helix, and the three strand pairings ^2^3, PsPi, and 
Pif]^. These substructures correspond to contact clusters on the native con- 
tact map of CI2 (see Fig. |2I). The model energy landscape of CI2 therefore 
is more complex than the landscape of the simple four-state model given 
above. However, under two assumptions, eq. ^ also holds for the helix 
of CI2. These assumptions are: (1) the helix is either fully formed or not 
formed in each of the states of the transition state ensemble, and (2) the 
helix does not form tertiary contacts in the transition state ensemble. Under 
these assumptions, the free energy contribution of the helix to a state of the 
transition state ensemble (in which the helix is formed) simply is ga, and 
then Xa has the same interpretation as above. ^ 

To test eq. (jH)), we consider the 20 single-residue mutations in the CI2 he- 
lix [2] . We estimate the change in intrinsic helix stability Aga from helicities 
predicted by the program AGADIR [44-46] (see Table 1). The experimen- 
tally measured change in folding rate for these mutations, log(A;^^^//c^^^), 
correlates with Ag^ with a coefficient r = 0.83, and the experimentally de- 
termined $-values correlate with Ag^/Ag^^ with r = 0.85 (see Fig. E}. 
According to eq. (jSJ, the change in log A: is proportional to Ag^, and the 
values are proportional to Ag^/Ag^^^, both with proportionality con- 
stant Xa- From the two linear fits shown in Fig. |31 we obtain the estimate 
Xa = 0.88 =b 0.12. We have estimated the errors for Xa using a jackknife 
method in which up to two data points are deleted randomly from the data 
set (see figure caption). This estimate for Xa indicates that the helix is 
almost fully formed in the transition state ensemble. In agreement with 
this interpretation, MD unfolding simulations indicate that a fraction of 
0.91 lb 0.14 of the helical residues are structured in the transition state en- 
semble [21]. 

^These two assumptions are clearly simplifying. Based on unfolding simulations, 
Daggett et al. [21] argue for a crucial tertiary interaction between the residues Alal6 
of the a-helix and Ile49 of the /3-sheet in the transition state ensemble of CI2. In con- 
trast, Lazaridis and Karplus [23] found that "the the number of contacts made by the Ala 
side chain [in the TSE] . . . depend[s] primarily on the presence of the helix and not on 
interactions with /3-strands." 
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Discussion 



Our model gives a physical explanation for nonclassical <I>-values, but an 
alternative explanation is in terms of experimental errors. Sanchez and 
Kiefhaber [47] have observed that mutations with nonclassical ^-values often 
have relatively small changes A^^v in stability. Since /S.gjsj appears in the 
denominator of the expression for $, it means that nonclassical values can 
arise when a mutation has little effect on the protein stability. Sanchez and 
Kiefhaber argue that unavoidable experimental errors may be responsible 
for the unusual ^-values, and that $-values for mutations with A.gjsi < 
1.7 kcal/mol are unreliable. Others have argued that this error threshold 
should be considerably smaller, around 0.6 kcal/mol [16,48]. The analysis of 
Sanchez and Kiefhaber is based on the assumption that different mutations 
at a given residue position should lead to the same 'true' $-value for this 
residue position. Our model gives a different interpretation. In our model, 
different mutations at a given position can affect the energy landscape in 
different ways. For example, we believe E14Q in the CI2 helix may affect 
the helicity significantly, while E14D does not (see Table 1). 

Our model can explain isolated nonclassical ^-values, such as the four in 

the a-helix of CI2 (sec Table 1). They are "isolated" insofar as they are 
interspersed among classical ^-values within a local region of the protein. 
There are other cases in which nonclassical ^-values are clustered together 
within a given region of the protein. In the second cc-helix of ACBP for 
example, 7 ^-values are clearly negative, while the other 6 ^-values are 
close to 0. Previously, clustered nonclassical $-values have been explained in 
terms of parallel flow processes on slightly more complex energy landscapes 
than we considered here [49]. That is, mutations that destabilize a particular 
substructure can cause a backflow on the energy landscape into faster flow 
channels, leading to an increase in the folding rate and negative <l>-values. 

We have considered here the ct-hclix of CI2 to illustrate our structural inter- 
pretation of <&-values. One reason is that the helix is very well characterized, 
i.e. a large number of ^-values is available. Another reason is that these 
values cover a wide range of possible values, from -0.35 to 1.23. Two other 
well-characterized helices arc the ct-hclices of protein L [9] and protein G [10]. 
15 single-residue mutations have been considered in the protein L helix. One 
of the ^-values is -0.39, whereas the others span a rather narrow range from 
-0.05 to 0.28 [9]. Similarly, one out of 9 ^-values for the helix of protein 
G is -0.81, whereas the others range from 0.05 to 0.55. In both cases, our 
model reproduces the clearly negative, nonclassical $-value, which leads to 
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relatively high correlation coefficients of 0.58 and 0.81 between the experi- 
mental and theoretical $-valuc distributions. But since the other $-valucs 
lie in a rather narrow range, the statistical uncertainties from experimental 
and modeling errors are high and Xa can not be determined reliably. 

Summary 

^-values give information about the routes of protein folding. The central 
question is: What information do they give? Previous modeling has been 
limited in certain ways. First, some models treat only topological aspects 
of folding, and therefore cannot explain how single-site mutations can have 
the large effects on folding rates that are often observed. Second, current 
models usually make some plausible, but ad hoc, assumption about folding 
routes, transition states, and reaction coordinates. Protein folding is suffi- 
ciently different than simpler reactions that some of these assumptions are 
not likely to be valid. In particular, values are often assumed to reflect 
only structural information about transition states. Here we present a more 
rigorous approach for interpreting $-values, and we show that <&-values have 
both structural and energetic components. We show that our approach gives 
a consistent interpretation of mutational experiments on the CI2 helix. 
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Table 1: Data for single-residue mutations in the a-helix of CI2 



mutation 


i?T ln(C7Cut) 






^9a 




S12G 


0.23 


0.8 


0.29 


0.28 


0.35 


S12A 


0.38 


0.89 


0.43 


0.14 


0.16 


E14Q 


0.36 


0.29 


1.23 


0.54 


1.86 


E14D 


0.10 


0.52 


0.2 


0.08 


0.15 


E14N 


0.53 


0.7 


0.75 


0.54 


0.77 


E15Q 


0.25 


0.47 


0.53 


0.56 


1.19 


E15D 


0.16 


0.74 


0.22 


0.13 


0.18 


E15N 


0.57 


1.07 


0.53 


0.57 


0.53 


A16G 


1.15 


1.09 


1.06 


0.82 


0.75 


K17A 


0.14 


0.49 


0.28 


0.04 


0.08 


K17G 


0.87 


2.32 


0.38 


0.80 


0.34 


K18G 


0.68 


0.99 


0.7 


0.75 


0.76 


V19A 


-0.13 


0.49 


-0.26 


-0.41 


-0.84 


I20V 


0.52 


1.3 


0.4 


0.14 


0.11 


L21A 


0.33 


1.33 


0.25 


-0.01 


-0.01 


L21G 


0.48 


1.38 


0.35 


0.26 


0.19 


Q22G 


0.07 


0.6 


0.12 


0.04 


0.07 


D23A 


-0.23 


0.96 


-0.25 


-0.41 


-0.43 


K24A 


-0.23 


0.65 


-0.35 


0.11 


0.17 


K24G 


0.31 


3.19 


0.1 


0.12 


0.04 



Experimental data for folding rates k'^^ and k'^^^ of wildtypc and mutants, 
stability changes Ag^"^, and <&-values are from Itzhaki ct al. [2]. The change 
Aga = ln(P^*/P™"*) in the 'intrinsic helix stability' ga is estimated from 
helicities Pa predicted by AGADIR [44-46]. The wildtype sequence of the 
13-residuc helix is SVEEAKKVILQDK. Helicities have been calculated at 
the experimental temperature 298 K, pH 6.25, and ionic strength 0.03 mol, 
with acetylated N-terminus and amidated C-terminus of the peptide to avoid 
terminal charges. The energetic quantities RT \n{k^ / k^^^) , Ag^^^, and 
Aga are given in units of kcal/mol. 
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Figure 1: The native structure of CI2 consists of a four-stranded /3-sheet 
packed against an a-hehx (PDB file ICOA). 
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Figure 2: Contact matrix of CI2. Each black dot represents a contact be- 
tween two amino acids in the native structure, with a distance of less than 
6 A between the Cq or C/3 atoms of the amino acids. The four large clusters 
of contacts correspond to the main structural elements of CI2: the a-helix 
and the /3-strand pairings (32(33, ^3(^4,, and (5i(3i. The few 'isolated' contacts 
either represent turns or tertiary interactions of a-helix and /3-sheet. 
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Figure 3: Correlation analysis for mutations in the CI2 helix. (Top) 
ln{k^^ / k"^^^) versus Ag^ = ln(P^*/P™'^*) estimated from helicities Pa pre- 
dicted by AGADIR (see Table 1). The correlation coefficient r is 0.83, and 
the slope of the fitted line through the origin is 0.98. The slope of this line 
is an estimate for the parameter Xa of eq. 0. For subsets of the data gen- 
erated by deleting up to two data points, the correlation coefficient r varies 
from 0.77 to 0.93, and the linear slope varies from 0.87 to 1.07. (Bottom) 
^exp versus Aga/ Ag^^^ . The correlation coefficient r is 0.85, and the slope 
of the fitted line through the origin is 0.71. For data subsets generated by 
deleting up to two data points, r varies from 0.79 to 0.90, and the slope 
varies from 0.64 to 0.90. 
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